Genomic prediction in multi-environment trials in maize using statistical and machine learning methods

In the context of multi-environment trials (MET), genomic prediction is proposed as a tool that allows the prediction of the phenotype of single cross hybrids that were not tested in field trials. This approach saves time and costs compared to traditional breeding methods. Thus, this study aimed to evaluate the genomic prediction of single cross maize hybrids not tested in MET, grain yield and female flowering time. We also aimed to propose an application of machine learning methodologies in MET in the prediction of hybrids and compare their performance with Genomic best linear unbiased prediction (GBLUP) with non-additive effects. Our results highlight that both methodologies are efficient and can be used in maize breeding programs to accurately predict the performance of hybrids in specific environments. The best methodology is case-dependent, specifically, to explore the potential of GBLUP, it is important to perform accurate modeling of the variance components to optimize the prediction of new hybrids. On the other hand, machine learning methodologies can capture non-additive effects without making any assumptions at the outset of the model. Overall, predicting the performance of new hybrids that were not evaluated in any field trials was more challenging than predicting hybrids in sparse test designs.

Maize (Zea mays) has emerged as an important crop for food, feed production, and various industrial applications, providing livelihoods for millions of people around the world 1,2 .However, its production is affected by several factors, with drought being one of the most common causes of agricultural shortages in rainfed systems 3 .This fact, combined with the high demand for this crop and the prospect of a worldwide growth of more than 2 billion people over the next 20 years 4 , makes it necessary to cultivate increasingly productive crops, as well as more adapted to climate change and also to different planting regions, such tropical conditions.
Cultivars can exhibit differentiated phenotypic responses between environments, and it is possible that a genotype may perform well in one environment but not in another [5][6][7] .To address this, breeders must submit the developed hybrids to multiple environment trials (MET).In MET, the main objectives are to study the interaction between genotypes and environments and to evaluate genotypic overall performance and stability 8 .However, MET phenotyping faces challenges such as the limited seeds availability, a high number of genotypes to be tested in the preliminary trials, and the associated costs, resulting in unbalanced experimental designs in different environments 9,10 .In sparse designs, where hybrids are not evaluated in all environments, accurately selecting superior hybrids for the next cycle can be difficult, as some hybrids may not be stable in many environments, and other genotypes that are discarded may outperform in untested environments.
To address these challenges, genomic prediction (GP) is proposed as a tool to predict the genetic value of individuals that were not evaluated in the field 11,12 .Several GP methods have been proposed, with Genomic Best Linear Unbiased Predictor (GBLUP) being one of the most commonly used methods.In the context of MET, predicting the genetic value of individuals not observed in specific environments has led to the development

Phenotypic data
The data are composed of 265 single cross hybrids from the maize breeding program of Embrapa Maize and Sorghum evaluated in eight combinations of trials/locations/years under irrigated trials (WW) and water stress (WS) conditions at two locations in Brazil (Janaúba-Minas Gerais and Teresina-Piauí) over two years (2010  and 2011).The hybrids were obtained from crosses between 188 inbred lines and two testers.The inbred lines belong to heterotic groups: dent (85 inbred lines), flint (86 inbred lines), and an additional group, referred to as group C (17 inbred lines), which is unrelated to the dent and flint origins.The two testers are inbred lines belonging to the flint (L3) and dent (L228-3) groups.Among the inbred lines, 120 were crossed with both testers, 52 were crossed with the L228-3 tester only, and 16 lines were crossed with the L3 tester only.Silva et al. (2020)  evaluated the genetic diversity and heterotic groups in the same database.These authors showed the existence of subgroups within each heterotic group.Therefore, once these groups were not genetically well defined and the breeding program from Embrapa Maize was in the beginning, the effect of allelic substitution in both groups are assumed to be the same.More details on the experimental design and procedures can be found in Dias et al. 13,30 .
The experiment originally included 308 entries, but hybrids that were not present in all environments were also removed to evaluate the genomic prediction within each environment, resulting in a total of 265 hybrids for analysis.Each trial consisted of 308 maize single cross hybrids, randomly divided into six sets: sets 1-3 for crosses with L3 (61, 61, and 14 hybrids each), and sets 4-6 for crosses with L228-3 (80, 77, and 15 hybrids each).Four checks (commercial maize cultivars) were included in each set, and the experiment was designed in completely randomized blocks.Between trials, hybrids within each set remained the same, but hybrids and checks were randomly allocated into groups of plots within each set.This allocation varied between replicates of sets and between trials.The WS trials had three replications, except for the set containing 15 hybrids and the trials evaluated in 2010, which had two replications.All WW trials, except for the trial in 2011, had two replicates.
Two agronomic traits related to drought tolerance were analyzed: grain yield (GY) and female flowering time (FFT).GY was determined by weighing all grains in each plot, adjusted for 13% grain moisture, and converted to tons per hectare (t/ha), accounting for differences in plot sizes across trials.FFT was measured as the number of days from sowing until the stigmas appeared in 50% of the plants.A summary of means, standard deviations, and ranges of both evaluated traits are available in Table 1.
To conduct the analyses, hybrids considered as outliers were removed (i.e., hybrids that presented phenotypic values greater than 1.5 × interquartile range above the third quartile or below the first quartile) for the GY and FFT traits.The variations in predictive abilities among hybrids of T2, T1, and T0 are widely recognized 31 .However, the primary aim of our study was to compare different prediction methodologies in MET assays.In this study, there were 240 T2 hybrids and 68 T1 hybrids, with T2 hybrids had both parents evaluated in different hybrid combinations, while hybrids being single-cross hybrids sharing one parent with the tested hybrids.Given the realistic nature of our scenario, we have a limited and imbalanced distribution of these hybrid groups, making a fair comparison challenging.Consequently, we opted to construct a training set comprising T2 and T0 hybrids. www.nature.com/scientificreports/

Statistical analysis of phenotypic data
To correct the phenotypic values for experimental design effects, each trial (WW and WS) and environment were analyzed independently to obtain the Best Linear Unbiased Estimator (eBLUEs) for each hybrid, for the two traits evaluated.The estimates were obtained based on the following model: where y(n × 1) is the phenotype vector for f replicates, t sets of p hybrids, and n is the number of observations; µ is the mean; r f × 1 is the fixed effect vector of the replicates; s(t × 1) is the fixed effect vector of the sets; h p × 1 is the fixed effect vector of the hybrids; and e(k × 1) is the residue vector, with e ∼, MVN 0, Iσ 2 e , where I is an identity matrix of corresponding order, and σ 2 e the residual variance.X 1 k × f , X 2 (k × t) e X 3 k × p represents incidence matrices for their respective effects.The eBLUES of each environment were used in further analyses.

Genotypic data
A total of 57,294 Single Nucleotide Polymorphisms (SNPs) markers were obtained from 188 inbred lines, and two testers used as parents of the 265 single cross hybrids.The genotyping by sequencing (GBS) strategyare detailed in Dias et al. 13 .For the quality control, SNPs were discarded if: the minor allele frequency was smaller than 5%, more than 20% of missing genotypes were found, and/or there were more than 5% of heterozygous genotypes.After filtering, missing data were imputed using NPUTE.Then, for each SNP, the genotypes of the hybrids were inferred based on the genotype of their parents (inbred line and tester).The number of SNPs per chromosome ranged from 3121 (chromosome 10) to 7705 (chromosome 1), totalizing 47,127 markers.

Genomic relationship matrix
The additive and dominance genomic relationship matrices were constructed 32 based on information from the SNPs using the package AGHmatrix 33 , following VanRaden 34 and Vitezica et al., respectively.

Genomic prediction
Genomic predictions were performed using the Genomic Best Line Unbiased Prediction (GBLUP) method using the package AsReml v. 4 36 .Two groups were considered: the first group comprised four environments under WW conditions, and the second included four environments under WS conditions.The linear model is described below: where y pq × 1 is the vector of eBLUES previously estimated for each environment with p hybrids and q environ- ments;µ is the mean; b q × 1 is the vector of environmental effects (fixed); u a pq × 1 is the vector of individual additive genetic values nested within environments (random), with u a ∼ MVN 0, I q σ 2 u a + ρ a J q − I q ⊗ A , where A is the genomic relationship matrix between individuals for additive effects, ρ a is the additive genetic correlation coefficient between environments, I q q × q is an identity matrix, J q q × q is a matrix of ones, and ⊗ denotes the Kronecker product; u d pq × 1 is the vector of individual dominance genetic values nested within environments (random), with u d ∼ MVN 0, I q σ 2 u d + ρ d J q − I q ⊗ D , where D is the genomic relationship matrix between individuals for dominance effects, ρ d is the dominance correlation coefficient between environments; e pq × 1 is the random residuals vector with e ∼ MVN 0, Iσ 2 e .The capital letters X pq × q , Z 1 pq × pq and Z 2 pq × pq represent the incidence matrices for their respective effects, 1 pq × 1 is a vector of ones.The (co)variance components were obtained using the residual maximum likelihood method (REML) 37 . (1) Table 1.Summary of means (Mean), standard deviations (SD), minimum (Min), and maximum (Max) for grain yield (GY) and female flowering time (FFT) obtained under irrigated (WW) and water stress (WS) conditions, evaluated in the years 2010 and 2011, at the locations of Janaúba (J) and Teresina (T).www.nature.com/scientificreports/Two alternative models were also used.The first for genomic prediction retained only additive effects by removing u d from Eq. ( 2).The second model was used to estimate the genetic parameters within each environ- ment separately.
The significance of random effects was tested using the Likelihood Ratio Test (LRT) 38 , given by: where LogL c is the logarithm of the likelihood function of the complete model (with all effects included), and LogL r is the logarithm of the restricted likelihood function of the reduced model (without the effect under test).Effect significance was tested by LRT using the chi-square (X 2 ) probability density function with a degree of freedom and significance level of 5% 39 .
The narrow-sense heritability ( h 2 ), the proportion of variance explained by dominance effects ( d 2 ), and the broad-sense heritability H 2 for each trait were estimated following Falconer and Mackay 1996 35 .

Machine learning
Similar to the previous topic, the trials were divided between WW and WS conditions, and the potential of regression trees (RT) was explored using the following three algorithms: bagging, random forest, and boosting 22 .Bagging (Bag) is a methodology that aims to reduce the RT variance 22 .In other words, it consists of obtaining D samples with available sampling replacement, thus obtaining D models f 1 (x), f 2 (x), . . ., f D (x) , and finally use the generated models to obtain an average, given by: This decreases the variability obtained in the decision trees.The number of trees used in Bag is not a parameter that will result in overfitting of the model.In practice, a number of trees is used until the error has stabilized 22 .The number of trees sampled for Bag was set at 500 trees.
Random forest (RF) was proposed by HO 40 and it is an improvement of Bag to avoid the high correlation of the trees and to improve the accuracy in the selection of individuals.RF changes only the number of predictor variables used in each split.That is, each time a split in a tree is considered, a random sample of m variables is chosen as candidates from the complete set of p variables.Hastie et al. 21suggest that the number of predictor vari- ables used in each partition is equal to m = p

Model validation
Genomic predictions were carried out following Burgueño et al. 16 , considering two different prediction problems, CV1 and CV2, which simulate two possible scenarios a breeder can face.In CV1, the ability of the algorithms to predict the performance of hybrids that have not yet been evaluated in any field trial was evaluated.Thus, predictions derived from the CV1 scenario are entirely based on phenotypic and genotypic records from other related hybrids.In CV2, the ability of the algorithms to predict the performance of hybrids using data collected in other environments was evaluated.It simulates the prediction problem found in incomplete MET trials.Here, information from related individuals is used, and the prediction can benefit from genetic relationships between hybrids and correlations between environments.Within the CV2 scenario, two different situations of data imbalance were evaluated.In the first, called CV2 (50%), the tested hybrids were not present in half of the environments, while in the second, called CV2 (25%), the tested hybrids were not present in only 25% of the environments.Table 2 provides a hypothetical representation of this CV1, CV2 (50%), and CV2 (25%) validation scheme.
To separate the training and validation sets, the k-folds procedure was used, considering k = 5 .The set of 265 hybrids was divided into five groups, with 80% of the hybrids considered as the training population, and the remaining 20% hybrids considered as the validation population.The hybrids were separated into sets proportionally containing all the crosses performed

Statement of handling of plants
The authors confirm that the appropriate permissions and/or licenses for collection of plant or seed specimens are taken.

Variance components and estimation of genetic parameters
Estimates of variance components and genetic parameters for GY and FFT under WW and WS conditions, obtained for the joint analysis with the four environments and analyses within each environment, are shown in Table 3 .For the joint analysis, the heritability estimates for GY and FFT were slightly different from those obtained by Dias et al. 13 using the same material, since here, a different statistical model was used to estimate the genetic parameters, and hybrids that were not present in all environments were removed.
The additive variance found for GY and FFT was greater than the variance due to dominance effects, in both WS and WW conditions.For GY, the variances due to dominance effects represented about 33.3% and 31.0% of the genetic variance in WS and WW conditions, respectively.Lower broad-sense heritability was observed for this trait in WW (0.42) when compared to the WS condition (0.53).As for FFT, the variances due to dominance effects represented about 19.9% and 20.8% of the genetic variance in WS and WW conditions, respectively, and the broad-sense heritabilities for FFT were greater in WW conditions (0.71) than in conditions WS (0.56).
For GY, the narrow-sense heritabilities within environments ranged from 0.30 (T11) to 0.38 (J10) under WS conditions and from 0.20 (T11) to 0.57 (J10) under WW conditions.The proportion of genetic variance explained by dominance deviations ranged from 0.04 (T10) to 0.43 (T11) under WS conditions and from 0.10 (T11) to 0.29 (J11) under WW conditions.The broad-sense heritabilities were lower for the experimental tests that had a lower number of repetitions (2010 under WS conditions, and 2011 under WW conditions).
For FFT, the narrow-sense heritabilities ranged from 0.09 (T10) to 0.80 (J10) under WS conditions and from 0.49 (T10) to 0.74 (J10) under WW conditions.The proportion of genetic variance explained by dominance deviations ranged from 0.01 (T10) to 0.26 (T11) under WS conditions and from 0.07 (J11) to 0.23 (T11) under WW conditions.The broad-sense heritabilities were higher for J10 (0.89 and 0.88) under WS and WW conditions.
The Eq. ( 2) is an implicit model to perform MET analyses 45 and provide genetic correlations for additive and dominance effects across environments.This model, reflects on the levels of genotypes-by-environment interaction.Particularly, for GY, the environmental correlations were 0.35 and 0.24 for WS and WW conditions, respectively, indicating an inconsistent ranking of hybrids across environments.As for FFT, the lowest correlation was observed for dominance effects.
For the GBLUP models, the predictive abilities were lower for the CV1 scenario, where the predictions were based only on phenotypic and genotypic records of other related hybrids.However, the predictive ability increased when the predicted hybrid was present in some environments, being intermediate in CV2-50% and higher in CV2-25%.A more notable improvement in predictive abilities was observed when transitioning from CV1 to CV2-50%.Specifically, for GBLUP-A, the average increase was about 107%, 19%, 18%, and 19% for GY-WS, GY-WW, FFT-WS, and FFT-WW, respectively.As for GBLUP-AD, the average increase was 19%, 7%, 12%, and 15% for GY-WS, GY-WW, FFT-WS, and FFT-WW, respectively (Table Sup.4).For GY, considering only CV1 scenario, the average predictive abilities were higher in WW conditions.For FFT, mean predictive abilities for WW conditions were higher than for WS for all scenarios (Table Sup.4).
The environments with the highest broad-sense heritabilities also exhibited the highest average predictive abilities across all evaluated methodologies (Table 3, Figs. 1 and 2).The GBLUP-AD model demonstrated superior predictive abilities for the GY and FFT traits in almost all environments and scenarios (CV1, CV2-50%, and CV2-25%).Conversely, the GBLUP-A performed equally or better than the GBLUP-AD model for FFT in environments where the dominance effect was not significant (T10-WS and J11-WW).However, for GY, in environments where the proportion of dominance genetic variance was close to or greater than the additive variance (J11 and T11 under WS), GBLUP-A exhibited lower predictive ability.
Unlike GBLUP, machine learning methodologies did not show a consistent pattern of increased predictive ability with the presence of phenotypic records of the hybrid to be predicted in correlated environments.Overall, an increase in predictive abilities for GY was observed when the scenario changed from CV1 to CV2, under WS conditions, which was not observed in many environments under WW conditions.For example, for RF, the average predictive abilities in the environments increased by approximately 51% for GY in WS conditions, while a decrease of about 10% was observed when the prediction changed from CV1 to CV2-50% for the same trait in WW conditions (Table Sup.4).As for FFT, Bag drew the most attention, presenting a large standard error in predictive abilities with statistically similar results for CV1, CV2-50% and CV2-25% in WS and WW conditions, in almost all environments.
In environments where dominance effects accounted for a large portion of the genetic variance, the Bag, RF and Boost methodologies showed intermediate results between the two GBLUP models.Among the evaluated Table 3. Estimates of variance components and genetic parameters for grain yield (GY) and female flowering time (FFT) were obtained considering the joint analysis for the four evaluated environments and analyses within each environment, for the irrigated (WW) and water stress (WS) conditions.σ 2 u a : additive genetic variance; σ 2 u d : dominance genetic variance; σ 2 e : residual variance; ρ a: additive genetic correlation coefficient between environments, ρ d : dominance genetic correlation coefficient between environments; h 2 : narrow sense heritability; d 2 : proportion of the variance explained by the dominance effect and H 2 : broad sense heritability.ns and *, not significant and significant at 5% probability of error.machine learning methodologies, Bag and RF produced very similar predictions and Boost presented the lowest predictive ability for GY.As for FFT, machine learning did not show improvement in predictive abilities when compared to GBLUP.

Discussion
We employed both statistical and machine learning methods to evaluate the efficiency of genomic predictions for GY and FFT.These models allow us to leverage information about relationships between hybrids and correlated environments.Three scenarios, CV1, CV2-50% and CV2-25%, were used, each characterized by different degrees of data imbalance.Predicting the performance of new hybrids that have not been evaluated in the field (CV1) is more challenging compared to predicting hybrids that have been evaluated in an unbalanced manner across correlated environments (CV2-50% and CV2-25%) 15,16 .Among the evaluated methodologies, GBLUP stands out as it allows to estimate relationships between individuals based on molecular markers for additive and dominant effects 32,34 .This methodology also allows to incorporate variance-covariance structures to handle correlated environments and unbalanced data 10,14 .As expected, GBLUP showed the highest predictive abilities in the CV2 validation schemes (CV2-50% and CV2-25%) as the predictions benefited from phenotypic records of the same hybrids in other correlated environments.Similar results have been reported in previous studies using wheat inbred lines and maize single cross hybrids [13][14][15][16] .
GBLUP-AD demonstrated greater efficiency to predict hybrids for traits and environments in which the effects due to dominance deviations were significant.As a substantial part of the genetic variance in maize hybrids for GY is attributed to dominance effects 46 , incorporating theses effects in the model significantly improved the prediction of this trait.However, when the dominance effect accounts for a small portion of the genetic variance (as observed for FFT), additive and additive-dominant models tend to show minor improvements in the predictions of the estimated total genetic effects 13,47 .This emphasizes the importance of understand the genetic bases of hybrids which can be decomposed into general combining ability (GCA) and specific combining ability (SCA).For GCA, the additive variance is the major component of the variance 48 and, SCA is largely dependent on genes with dominance or epistatic effects 49 .In this context, gaining a comprehension of the relative magnitudes of GCA and SCA variations is instrumental in guiding the formulation of optimal strategies for hybrid breeding programs, as highlighted in previous research 50 .
One possible reason why tree-based learning models did not benefit from the presence of phenotyped hybrids in correlated environments is related to their shared concept of recursive division 51 .These models aim to find decision rules that naturally partition the prediction space to provide an informative and robust hierarchical model 21,52 .In this context, it is conceivable that the location/year variables played a crucial role in most of the tree construction scenarios, leading to the split of the same hybrids at the beginning of branching and making it difficult for them to be grouped at the last nodes, which are responsible for the predictive response.
Among the evaluated methodologies, Bag showed, in most environments, the same prediction pattern for both scenarios where the predicted hybrid was absent in all environments and where it was absent only in some environments.Bag is a variation of the decision tree that uses resampling of the original data in subsamples (bootstrap) according to the determined number of trees.This process may lead to high correlation between the generated trees, resulting in the same variable consistently being the most important one 21 .As consequence, the same hybrids in different environments may end up at very distant nodes in the construction of the tree.The RF further reduces the dependence between the decision trees by randomly selecting part of the original data and variables to build the trees 22,51 .This allows different variables to have a chance to be at the top in their construction.On the other hand, boost sequentially combines different predictors, fitting a new tree to the residuals of the previous model using a specified loss function (e.g.mean squared error for regression) 53 .It incorporates automatic indirect selection of markers and is generally recommended for regression analysis 17,20 .It is important to note that these variations of decision trees mentioned above are typically used when obtaining accurate predictions is more important than the biological interpretation of the model itself 52 .
One of the advantages of using machine learning methodologies is that they do not require the specification of the trait inheritance, having as an initial hypothesis the possibility of capturing non-additive effects in the genome, which are often not specified in traditional statistical methods 25 .Possibly tree-based machine learning methodologies managed to capture part of the dominance effect, presenting better results than GBLUP-A in environments where dominance represented a large part of the genetic variance.These methodologies are competitive with statistical models and tend to outperform them when applied to large data sets.However, when applied to small training sets, machine learning is probably not able to capture all non-additive information and linear models may perform better 54 .
Tree-based machine learning methodologies (Bag, RF, and Boost) are considered promising for genomic prediction, especially for traits controlled by a few quantitative trait locus (QTL), capturing non-additive effects in their models 25 .Among the machine learning methodologies evaluated in this study, Boost is considered the most sensitive methodology for the variation of traits and environments 25,26,55 .In these studies, carried out with simulated data, the lower the heritability and the greater the number of QTL that control the trait, the lower the Boost prediction efficiency.This fact may be related to the lower values of predictive ability for GY when compared to other methodologies, since this is a quantitative trait with low heritability, greatly influenced by the environment 56,57 .
A possible explanation for the lower values of predictive ability of machine learning methodologies when compared to GBLUP in FFT prediction is that, even though it is a complex trait 58,59 , additive genetic variance contributed with a large proportion of this trait.If the relationship between trait and response is close to a linear model, then a linear approach will probably work well and be superior to a method like a regression tree that does not explore this linear structure 22,60 .
In the context of a hybrid breeding program, identifying high-performance genotypes is essential 61 .However, extensive field trials are needed to identify the best hybrids in the target environment.These trials require resources in terms of people, equipment, and area to carry out the phenotyping of the plants.Furthermore, most crosses are discarded after evaluation due to poor overall performance 62 .Associated with this fact is the trend of stagnating or rising costs of field evaluations, unlike genotyping which tends to decrease 63 .In this sense, the use of genomic information in multiple-environment trials is an alternative to traditional field trials, as it allows to reduce the phenotyping costs and to increase the number of hybrid combinations evaluated when performing the prediction of the genetic value of individuals that were not evaluated in the field 11,64 .
Based on the results of this study, a practical application of cost reduction and the efficiency of genomic selection in a breeding program can be demonstraded 10 .Assuming the cost of an experimental maize trial is $17.00 per plot 65 and the cost of sequencing genotyping (GBS) standard is $25.38 per parental line 66 , the total budget needed for a breeding program would be $108,120.00for phenotyping the 265 single cross hybrids in eight experimental trials using three replications, and $5,076.00for genotyping the 188 parental lines and the two testers.Using the GBLUP-AD model, it was shown that the predictive abilities for the untested single hybrids averaged greater than 0.40 for GY under WW conditions, with an imbalance of 25% of hybrids randomly missing in each environment.Consequently, the cost reduction for the breeding program would be approximately 25% or $27,030.00compared to phenotyping alone, which would cover the costs of genotyping the lines ($5,076.00).
The models proposed in this study can be applied to other crops that use hybrids to explore heterosis, and they can be expanded to include environmental variables to predict non-evaluated environments.In addition, this study demonstrates the application of machine learning methodologies in tests of multiple environments for predicting of hybrids, comparing their performance with GBLUP with non-additive effects, thus highlighting the potential of both methodologies.

Conclusion
Genomic prediction for untested maize single cross hybrids using both statistical and machine learning approaches were applied in MET assays.The results demonstrate that both methodologies are efficient and can be valuable tools in maize breeding programs to accurately predict the performance of hybrids in specific environments.The choice of the best methodology depends on the specific case.To optimize the predictive ability of GBLUP, it is crucial to accurately model the variance components.On the other hand, machine learning methodologies have the advantage of capturing non-additive effects without making any assumptions at the outset of the model.We found that predicting the performance of new hybrids that were not evaluated in any field trials was more challenging than predicting hybrids in unbalanced trials.However, regardless of the methodology used, environments with the lowest heritability showed the lowest predictive abilities, underscoring the importance of conducting well-designed and properly replicated experiments.